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• Balanced tree structures we know at this point: red-black trees, B-trees, treaps. 

• Could you implement them right now? Probably, with time. . . but without looking up any 
details? 

• Skip lists are a simple randomized structure you'll never forget. 



Starting from scratch 

• Initial goal: just searches — ignore updates (Insert/Delete) for now 

• Simplest data structure: linked list 

• Sorted linked list: ©(n) time 

• 2 sorted linked lists: 

- Each element can appear in 1 or both lists 

- How to speed up search? 

- Idea: Express and local subway lines 



- Example: \U\ 23, \}A\ ^ 50, 59, 66, 79, 86, [96] 103, 110, 116, 125 

(What is this sequence?) 



Boxed values are "express" stops; others are normal stops 



- Can quickly jump from express stop to next express stop, or from any stop to next 
normal stop 

- Represented as two linked lists, one for express stops and one for all stops: 
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- Every element is in bottom linked list (I/2); some elements also in top linked list (Li) 

- Link equal elements between the two levels 

- To search, first search in Li until about to go too far, then go down and search in L2 



2 



Handout 1 7: Lecture Notes on Skip Lists 



- Cost: 



- Minimized when 
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\Li\ = ^ 
search cost = 2\/n 



- Resulting 2-level structure: 




• 3 hnked hsts: 3 • \/n 

• k hnked hsts: k ■ \fn 



Ig n hnked hsts: Ig n ■ = Ig n • = e(lg n) 

=2 

- Becomes hke a binary tree: 
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- (Li fact, a level-hnked B+-tree; see Problem Set 5.) 

- Example: Search for 72 

* Level 1: 14 too small, 79 too big; go down 14 

* Level 2: 14 too small, 50 too small, 79 too big; go down 50 

* Level 3: 50 too small, 66 too small, 79 too big; go down 66 

* Level 4: 66 too small, 72 spot on 
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Insert 

• New element should certainly be added to bottommost level 
(Invariant: Bottommost list contains all elements) 

• Which other lists should it be added to? 

(Is this the entire balance issue all over again?) 

• Idea: Flip a coin 

- With what probability should it go to the next level? 

- To mimic a balanced binary tree, we'd like half of the elements to advance to the next- 
to-bottommost level 

- So, when you insert an element, flip a fair coin 

- If heads: add element to next level up, and flip another coin (repeat) 

• Thus, on average: 

- 1/2 the elements go up 1 level 

- 1/4 the elements go up 2 levels 

- 1/8 the elements go up 3 levels 

- Etc. 

• Thus, "approximately even" 

Example 

• Get out a real coin and try an example 

• You should put a special value — cxd at the beginning of each list, and always promote this 
special value to the highest level of promotion 

• This forces the leftmost element to be present in every list, which is necessary for searching 

. . . many coins are flipped . . . 
(Isn't this easy?) 

• The result is a skip list. 

• It probably isn't as balanced as the ideal configurations drawn above. 

• It's clearly good on average. 

• Claim it's really really good, almost always. 
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Analysis: Claim of With High Probability 

• Theorem: With high probability, every search costs 0(lg n) in a skip list with n elements 

• What do we need to do to prove this? [Calculate the probability, and show that it's high!] 

• We need to define the notion of "with high probability"; this is a powerful technical notion, 
used throughout randomized algorithms 

• Informal definition: An event occurs with high probability if, for any a > 1, there is an 
appropriate choice of constants for which E occurs with probability at least 1 — 

• In reality, the constant hidden within O (Ig n) in the theorem statement actually depends on c. 

• Precise definition: A (parameterized) event occurs with high probability if, for any 

a > 1, Ea occurs with probability at least 1 — Ca/rf, where is a "constant" depending 
only on a. 

• The term 0(l/n") or more precisely c^/n" is called the error probability 

• The idea is that the error probability can be made very very very small by setting a to 
something big, e.g., 100 

Analysis: Warmup 

• Lemma: With high probability, skip list with n elements has 0(lg n) levels 

• (In fact, the number of levels is 0(log n), but we only need an upper bound.) 

• Proof: 

- Pr{ element x is in more than c\gn levels} = 1/2'^'^" = 1/rf' 

- Recall Boole's inequality / union bound: 

Pr{£;i \JE2U---\JEk] < Pr{£;i} + Prj^a} + • • • + ^^{Ek} 

- Applying this inequality: 

Pr{any element is in more than clgn levels} <n-l/n'^ = l/n*^"^ 

- Thus, error probability is polynomially small and exponent (a = c — 1) can be made 
arbitrarily large by appropriate choice of constant in level bound of 0(lg n) 
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Analysis: Proof of Theorem 

• Cool idea: Analyze search backwards — from leaf to root 

- Search starts at leaf (element in bottommost level) 

- At each node visited: 

* If node wasn't promoted higher (got TAILS here), then we go [came from] left 

* If node was promoted higher (got heads here), then we go [came from] up 

- Search stops at root of tree 

• Know height is 0(lg n) with high probability; say it's clg n 

• Thus, the number of "up" moves is at most clg n with high probability 

• Thus, search cost is at most the following quantity: 

How many times do we need to flip a coin to get clgn heads? 

• Intuitively, 6 (Ign) 



Analysis: Coin Flipping 

• Claim: Number of flips till c Ig n heads is 0(lg n) with high probability 

• Again, constant in ©(Ig n) bound will depend on a 

• Proof of claim: 



- Say we make 10c Ign flips 

- When are there at least c Ig n heads? 

/lOc Ig ri^ 
V clgn ; 



- Pr{ exactly c Ig n heads} 



orders 

HHHTTT vs. HTHTHT 




, , , , /I0clgn\ /l\9^ig" 
- Prjat most c Ig n heads } < ' o ) 



clgn I \2j 



overestimate tails 
on orders 



- Recall bounds on (j^ : 
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- Applying this formula to the previous equation: 

n , , . /I0clgn\ /l\9^ig" 

Pr| at most clg n heads} < I ^ I y-j 



(lOe) 



clgn J 

\ 9c Ig n 



clgn 



• 1 ^ 9clgn 

2lg(10e)-clgn I 



^ 2(^s(10e)-9)clgn 

- The point here is that, as 10 — > cxo, a = 9 — Ig(lOe) oo, independent of (for all) c 
• End of proof of claim and theorem 
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